Efficient Pattern Matching in Elastic-Degenerate Strings

نویسندگان

  • Costas S. Iliopoulos
  • Ritu Kundu
  • Solon P. Pissis
چکیده

In this paper, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of k > 1 seeds (substrings/subpatterns) interleaved by elastic-degenerate symbols such that each elastic-degenerate symbol corresponds to a set of two or more variable length strings. Here, we present an algorithm for solving the pattern matching problem with (solid) pattern and elastic-degenerate text, running in O(N+αγnm) time; where m is the length of the given pattern; n and N are the length and total size of the given elastic-degenerate text, respectively; α and γ are small constants, respectively representing the maximum number of strings in any elastic-degenerate symbol of the text and the largest number of elastic-degenerate symbols spanned by any occurrence of the pattern in the text. The space used by the algorithm is linear in the size of the input for a constant number of elastic-degenerate symbols in the text; α and γ are so small in real applications that the algorithm is expected to work very efficiently in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform

A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows–Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid patternmatching technique that works on both regular and dege...

متن کامل

Linear Algorithm for Conservative Degenerate Pattern Matching

A degenerate symbol x̃ over an alphabet Σ is a non-empty subset of Σ, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given ...

متن کامل

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

Efficient Pattern Matching on Binary Strings

The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in telecom and computer network applications. Moreover the problem finds applications also in the field of image processing and in pattern matching on compressed ...

متن کامل

Flexible and Efficient Algorithms for Abelian Matching in Strings

The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. This problem finds application in many areas and can be solved in linear time by a näıve sliding window approach. In this short communication we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.08111  شماره 

صفحات  -

تاریخ انتشار 2016